library(topGO)
library(knitr)   # to use kable()
library(limma)   # to use vennDiagram()
library(ggplot2)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
                   isoforms = '../2019-07-26/transcripts/annotation.txt')
TAGDIR <- paste0('../2020-01-08/', TAG, '/')

Introduction

This is the enrichment analysis for genes regulated by regime. Because I quantified the association of expression with regime in two different ways, this document has two sections. Section Variance reports the enrichment analysis performed with genes ordered by the proportion of expression variance explained by regime. Note that some genes with low variance may still be highly associated with regime, even if the fold change between levels of this factor is low. Section Differential expression uses an ordering of genes based on the significance of the differential expression between levels of regime, which does depend on fold change.

Reading the data

Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by regime or p-value of differential expression test.

tagVariance <- read.table(paste0(TAGDIR, VAR, '_variance.txt'))
tagPValue   <- read.table(paste0(TAGDIR, VAR, '_pvalue.txt'))
annotation  <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))

To initialize the topGOdata object, I will need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, either the portion of variance explained by regime or the p-values of the differential expression test. The structure() function adds attributes to an object.

Variance <- structure(tagVariance[,1], names = row.names(tagVariance))
PValues  <- structure(tagPValue[,1],   names = row.names(tagPValue))
rm(tagVariance, tagPValue)

There are 18598 genes scored with a variance portion and a differential expression p-value. It should actually be the exact same genes. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.

head(annotation)
##       tagname                                                           goterms
## 1 XLOC_000002                                  GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009                                  GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010                       GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021                       GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036                                  GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)

There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.

for (tag in unique(c(names(PValues), names(Variance)))) {
   if (! tag %in% names(allgenes2GO)) {
      allgenes2GO <- append(allgenes2GO,
         structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
   }
}

Using differential expression p-values

Building the topGO object

Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.

I generate three datasets, to analyse each of the three ontologies.

selection <- function(allScores) {return(allScores < 0.05)}
GOdata.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
   Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.

BP.elim     <- runTest(GOdata.BP, algorithm = "elim",     statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea      <- runTest(GOdata.BP, algorithm = "lea",      statistic = "ks")
MF.elim     <- runTest(GOdata.MF, algorithm = "elim",     statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea      <- runTest(GOdata.MF, algorithm = "lea",      statistic = "ks")
CC.elim     <- runTest(GOdata.CC, algorithm = "elim",     statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea      <- runTest(GOdata.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
   Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 34
BP weight01 1154 23
BP lea 1154 70
MF elim 576 37
MF weight01 576 31
MF lea 576 75
CC elim 291 19
CC weight01 291 11
CC lea 291 37
rm(ResultsSummary)

Results

The topGO package does not pay much attention to what terms are significant because they are overrepresented and which ones are underrepresented among the most differentially expressed genes. I think it’s worth separating them, to facilitate the biological interpretation. Note that not all terms listed in the tables below are significant. The scores for the three methods (elim, weight01 and lea) are non-corrected p-values.

Biological process

orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.pvalue.sigTerms <- sigTerms

BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BP.all[BP.all$Significant > BP.all$Expected,],
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0055085 transmembrane transport 500 260 185.43 1 6.9e-13 2.5e-12 7.8e-16
2 GO:0006508 proteolysis 442 217 163.92 2 1.2e-09 4.6e-08 4.7e-11
3 GO:0006468 protein phosphorylation 276 132 102.36 3 4.3e-06 3.7e-06 4.3e-06
4 GO:0055114 oxidation-reduction process 376 175 139.44 4 3.6e-05 2.2e-05 3.6e-05
5 GO:0006289 nucleotide-excision repair 13 11 4.82 6 0.00050 0.0005 0.00050
6 GO:0007160 cell-matrix adhesion 16 11 5.93 11 0.00118 0.0012 0.00118
7 GO:0007186 G protein-coupled receptor signaling pat… 206 94 76.40 12 0.00122 0.0014 0.00122
8 GO:0007165 signal transduction 544 229 201.75 31 0.00849 0.0016 6.3e-05
9 GO:0006813 potassium ion transport 56 29 20.77 7 0.00057 0.0017 0.00057
10 GO:0006355 regulation of transcription, DNA-templat… 355 158 131.66 5 0.00018 0.0023 0.00018
11 GO:0006470 protein dephosphorylation 58 31 21.51 8 0.00059 0.0023 0.00059
12 GO:0034220 ion transmembrane transport 147 74 54.52 10 0.00096 0.0032 0.00096
13 GO:0006979 response to oxidative stress 26 14 9.64 16 0.00343 0.0034 0.00343
14 GO:0005992 trehalose biosynthetic process 6 5 2.23 17 0.00352 0.0035 0.00352
15 GO:1901642 nucleoside transmembrane transport 7 6 2.60 19 0.00377 0.0038 0.00377
16 GO:0044271 cellular nitrogen compound biosynthetic … 783 300 290.38 906 0.74664 0.0040 0.82339
17 GO:0043161 proteasome-mediated ubiquitin-dependent … 45 23 16.69 23 0.00491 0.0049 0.00491
18 GO:0048870 cell motility 14 8 5.19 174 0.05915 0.0062 0.05915
19 GO:0003341 cilium movement 10 8 3.71 30 0.00770 0.0077 0.00770
21 GO:0060271 cilium assembly 46 28 17.06 9 0.00064 0.0088 0.00064
22 GO:0051726 regulation of cell cycle 43 19 15.95 411 0.22595 0.0090 0.36698
23 GO:0046942 carboxylic acid transport 20 14 7.42 24 0.00527 0.0093 0.00527
24 GO:0006367 transcription initiation from RNA polyme… 13 9 4.82 35 0.01065 0.0107 0.01065
25 GO:0007017 microtubule-based process 195 96 72.32 27 0.00656 0.0108 0.00110
26 GO:0007156 homophilic cell adhesion via plasma memb… 21 8 7.79 40 0.01237 0.0124 0.01237
27 GO:0008272 sulfate transport 13 8 4.82 42 0.01265 0.0126 0.01265
28 GO:0006383 transcription by RNA polymerase III 8 7 2.97 45 0.01296 0.0130 0.01296
29 GO:0043628 ncRNA 3’-end processing 6 4 2.23 48 0.01399 0.0140 0.01399
30 GO:0016998 cell wall macromolecule catabolic proces… 9 6 3.34 54 0.01555 0.0156 0.01555
31 GO:0006032 chitin catabolic process 9 6 3.34 55 0.01555 0.0156 0.01555
32 GO:0006821 chloride transport 8 6 2.97 64 0.01609 0.0161 0.01609
33 GO:0006241 CTP biosynthetic process 5 5 1.85 65 0.01614 0.0161 0.01614
34 GO:0046039 GTP metabolic process 5 5 1.85 70 0.01614 0.0161 0.01614
kable(
   BP.all[BP.all$Significant < BP.all$Expected,],
   caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
20 GO:1901566 organonitrogen compound biosynthetic pro… 432 139 160.21 690 0.50350 0.0079 0.89898
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
                 Definition = Definition(GOTERM[sigTerms]),
                 PValue=score(BP.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with',
                      VAR, 'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0055085 transmembrane transport The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. 0.0000000
GO:0006508 proteolysis The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. 0.0000000
GO:0006468 protein phosphorylation The process of introducing a phosphate group on to a protein. 0.0000037
GO:0055114 oxidation-reduction process A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. 0.0000223
GO:0006289 nucleotide-excision repair A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). 0.0004978
GO:0007160 cell-matrix adhesion The binding of a cell to the extracellular matrix via adhesion molecules. 0.0011770
GO:0007186 G protein-coupled receptor signaling pathway A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). 0.0013843
GO:0007165 signal transduction The cellular process in which a signal is conveyed to trigger a change in the activity or state of a cell. Signal transduction begins with reception of a signal (e.g. a ligand binding to a receptor or receptor activation by a stimulus such as light), or for signal transduction in the absence of ligand, signal-withdrawal or the activity of a constitutively active receptor. Signal transduction ends with regulation of a downstream cellular process, e.g. regulation of transcription or regulation of a metabolic process. Signal transduction covers signaling from receptors located on the surface of the cell and signaling via molecules located within the cell. For signaling between cells, signal transduction is restricted to events at and within the receiving cell. 0.0015717
GO:0006813 potassium ion transport The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0016705
GO:0006355 regulation of transcription, DNA-templated Any process that modulates the frequency, rate or extent of cellular DNA-templated transcription. 0.0023402
GO:0006470 protein dephosphorylation The process of removing one or more phosphoric residues from a protein. 0.0023475
GO:0034220 ion transmembrane transport A process in which an ion is transported across a membrane. 0.0032023
GO:0006979 response to oxidative stress Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of oxidative stress, a state often resulting from exposure to high levels of reactive oxygen species, e.g. superoxide anions, hydrogen peroxide (H2O2), and hydroxyl radicals. 0.0034284
GO:0005992 trehalose biosynthetic process The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. 0.0035202
GO:1901642 nucleoside transmembrane transport NA 0.0037723
GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, and mediated by the proteasome. 0.0049073
GO:0003341 cilium movement The directed, self-propelled movement of a cilium. 0.0077007
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0087982
GO:0046942 carboxylic acid transport The directed movement of carboxylic acids into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. Carboxylic acids are organic acids containing one or more carboxyl (COOH) groups or anions (COO-). 0.0092880

I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.

showSigOfNodes(GOdata.BP, score(BP.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 226 
## Number of Edges = 427 
## 
## $complete.dag
## [1] "A graph with 226 nodes."

This is just a example of the most significant GO term:

showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.pvalue.sigTerms <- sigTerms

MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   MF.all[MF.all$Significant > MF.all$Expected,],
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0005515 protein binding 2101 925 794.52 1 9.6e-14 1.5e-15 7.1e-16
GO:0005509 calcium ion binding 361 197 136.52 2 5.0e-12 5.0e-12 5.0e-12
GO:0005525 GTP binding 198 95 74.88 3 4.9e-05 4.9e-05 4.9e-05
GO:0004672 protein kinase activity 274 127 103.62 5 6.5e-05 0.00011 6.5e-05
GO:0003774 motor activity 131 69 49.54 10 0.00049 0.00013 0.00049
GO:0022857 transmembrane transporter activity 583 282 220.47 55 0.01674 0.00032 2.5e-09
GO:0005524 ATP binding 786 334 297.24 6 0.00035 0.00035 0.00035
GO:0005507 copper ion binding 18 13 6.81 8 0.00042 0.00042 0.00042
GO:0005200 structural constituent of cytoskeleton 17 11 6.43 9 0.00049 0.00049 0.00049
GO:0004181 metallocarboxypeptidase activity 19 14 7.19 11 0.00061 0.00061 0.00061
GO:0016491 oxidoreductase activity 417 188 157.69 50 0.01405 0.00071 0.00343
GO:0004252 serine-type endopeptidase activity 88 43 33.28 12 0.00072 0.00072 0.00072
GO:0061630 ubiquitin protein ligase activity 27 17 10.21 13 0.00113 0.00113 0.00113
GO:0005230 extracellular ligand-gated ion channel a… 54 28 20.42 51 0.01507 0.00127 0.01507
GO:0008138 protein tyrosine/serine/threonine phosph… 23 16 8.70 17 0.00181 0.00181 0.00181
GO:0004198 calcium-dependent cysteine-type endopept… 28 16 10.59 21 0.00257 0.00257 0.00257
GO:0003924 GTPase activity 138 67 52.19 22 0.00262 0.00262 0.00262
GO:0004601 peroxidase activity 34 19 12.86 40 0.01130 0.00274 0.01130
GO:0042626 ATPase activity, coupled to transmembran… 86 49 32.52 14 0.00116 0.00343 0.00116
GO:0005337 nucleoside transmembrane transporter act… 7 6 2.65 24 0.00406 0.00406 0.00406
GO:0016787 hydrolase activity 1382 648 522.62 20 0.00237 0.00478 7.8e-17
GO:0008017 microtubule binding 62 36 23.45 27 0.00524 0.00524 0.00524
GO:0016715 oxidoreductase activity, acting on paire… 14 10 5.29 28 0.00529 0.00529 0.00529
GO:0004222 metalloendopeptidase activity 73 40 27.61 29 0.00570 0.00570 0.00570
GO:0016840 carbon-nitrogen lyase activity 8 7 3.03 30 0.00572 0.00572 0.00572
GO:0030248 cellulose binding 7 6 2.65 31 0.00647 0.00647 0.00647
GO:0047631 ADP-ribose diphosphatase activity 6 5 2.27 32 0.00720 0.00720 0.00720
GO:0005506 iron ion binding 72 39 27.23 16 0.00144 0.00740 0.00144
GO:0005249 voltage-gated potassium channel activity 27 12 10.21 33 0.00748 0.00748 0.00748
GO:0004930 G protein-coupled receptor activity 180 85 68.07 18 0.00182 0.00775 0.00182
GO:0016791 phosphatase activity 100 53 37.82 66 0.02452 0.00996 0.00138
GO:0030246 carbohydrate binding 51 31 19.29 39 0.01068 0.01022 0.00071
GO:0020037 heme binding 95 49 35.93 42 0.01213 0.01213 0.01213
GO:0008092 cytoskeletal protein binding 144 77 54.46 37 0.00965 0.01257 0.00037
GO:0008271 secondary active sulfate transmembrane t… 13 8 4.92 46 0.01352 0.01352 0.01352
GO:0043565 sequence-specific DNA binding 146 61 55.21 98 0.04742 0.01536 0.04742
GO:0004568 chitinase activity 9 6 3.40 52 0.01589 0.01589 0.01589
kable(
   MF.all[MF.all$Significant < MF.all$Expected,],
   caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MF.weight01)[sigTerms]),
      caption = paste('Molecular function terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular function terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000000
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0000000
GO:0005525 GTP binding Interacting selectively and non-covalently with GTP, guanosine triphosphate. 0.0000489
GO:0004672 protein kinase activity Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. 0.0001111
GO:0003774 motor activity Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. 0.0001264
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0003453
GO:0005507 copper ion binding Interacting selectively and non-covalently with copper (Cu) ions. 0.0004234
GO:0005200 structural constituent of cytoskeleton The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. 0.0004903
GO:0004181 metallocarboxypeptidase activity Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0006050
GO:0004252 serine-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). 0.0007237
GO:0061630 ubiquitin protein ligase activity Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. 0.0011253
GO:0008138 protein tyrosine/serine/threonine phosphatase activity Catalysis of the reactions: protein serine + H2O = protein serine + phosphate; protein threonine phosphate + H2O = protein threonine + phosphate; and protein tyrosine phosphate + H2O = protein tyrosine + phosphate. 0.0018139
GO:0004198 calcium-dependent cysteine-type endopeptidase activity Catalysis of the hydrolysis of nonterminal peptide bonds in a polypeptide chain by a mechanism using a cysteine residue at the enzyme active center, and requiring the presence of calcium. 0.0025653
GO:0003924 GTPase activity Catalysis of the reaction: GTP + H2O = GDP + phosphate. 0.0026185
GO:0042626 ATPase activity, coupled to transmembrane movement of substances Catalysis of the reaction: ATP + H2O = ADP + phosphate, to directly drive the active transport of a substance across a membrane. 0.0034322
GO:0005337 nucleoside transmembrane transporter activity Enables the transfer of a nucleoside, a nucleobase linked to either beta-D-ribofuranose (ribonucleoside) or 2-deoxy-beta-D-ribofuranose, (a deoxyribonucleotide) from one side of a membrane to the other. 0.0040616
GO:0016787 hydrolase activity Catalysis of the hydrolysis of various bonds, e.g. C-O, C-N, C-C, phosphoric anhydride bonds, etc. Hydrolase is the systematic name for any enzyme of EC class 3. 0.0047809
GO:0008017 microtubule binding Interacting selectively and non-covalently with microtubules, filaments composed of tubulin monomers. 0.0052445
GO:0016715 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced ascorbate as one donor, and incorporation of one atom of oxygen Catalysis of an oxidation-reduction (redox) reaction in which hydrogen or electrons are transferred from reduced ascorbate and one other donor, and one atom of oxygen is incorporated into one donor. 0.0052901
GO:0004222 metalloendopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0056977
GO:0016840 carbon-nitrogen lyase activity Catalysis of the release of ammonia or one of its derivatives, with the formation of a double bond or ring. Enzymes with this activity may catalyze the actual elimination of the ammonia, amine or amide, e.g. CH-CH(-NH-R) = C=CH- + NH2-R. Others, however, catalyze elimination of another component, e.g. water, which is followed by spontaneous reactions that lead to breakage of the C-N bond, e.g. L-serine ammonia-lyase (EC:4.3.1.17), so that the overall reaction is C(-OH)-CH(-NH2) = CH2-CO- + NH3, i.e. an elimination with rearrangement. The sub-subclasses of EC:4.3 are the ammonia-lyases (EC:4.3.1), lyases acting on amides, amidines, etc. (EC:4.3.2), the amine-lyases (EC:4.3.3), and other carbon-nitrogen lyases (EC:4.3.99). 0.0057185
GO:0030248 cellulose binding Interacting selectively and non-covalently with cellulose. 0.0064685
GO:0047631 ADP-ribose diphosphatase activity Catalysis of the reaction: ADP-ribose + H2O = AMP + D-ribose 5-phosphate. 0.0071963
GO:0005506 iron ion binding Interacting selectively and non-covalently with iron (Fe) ions. 0.0074013
GO:0005249 voltage-gated potassium channel activity Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. 0.0074770
GO:0004930 G protein-coupled receptor activity Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0077529
showSigOfNodes(GOdata.MF, score(MF.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 142 
## Number of Edges = 191 
## 
## $complete.dag
## [1] "A graph with 142 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.pvalue.sigTerms <- sigTerms

CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   CC.all[CC.all$Significant > CC.all$Expected,],
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0016021 integral component of membrane 885 425 316.52 1 4.6e-16 1.4e-14 4.8e-22
GO:0016020 membrane 1565 707 559.72 3 1.5e-05 1.1e-05 5.5e-26
GO:0005887 integral component of plasma membrane 118 61 42.20 2 1.3e-05 4.1e-05 1.3e-07
GO:0016459 myosin complex 33 23 11.80 4 4.9e-05 4.9e-05 4.9e-05
GO:0098800 inner mitochondrial membrane protein com… 21 14 7.51 10 0.00191 0.00018 0.00191
GO:0008076 voltage-gated potassium channel complex 13 8 4.65 6 0.00072 0.00072 0.00072
GO:0090575 RNA polymerase II transcription factor c… 23 17 8.23 8 0.00087 0.00101 0.00087
GO:0005576 extracellular region 140 65 50.07 11 0.00209 0.00286 0.00209
GO:0005886 plasma membrane 170 84 60.80 5 0.00068 0.00416 1.1e-10
GO:0042555 MCM complex 10 6 3.58 17 0.00893 0.00893 0.00893
GO:0005634 nucleus 531 215 189.91 14 0.00673 0.00900 0.02945
GO:0005743 mitochondrial inner membrane 33 22 11.80 21 0.01125 0.01125 0.00023
GO:0098796 membrane protein complex 145 69 51.86 37 0.02782 0.01309 0.00525
GO:0005929 cilium 27 18 9.66 20 0.01078 0.01334 0.00176
GO:0044430 cytoskeletal part 152 73 54.36 45 0.03229 0.01571 0.01508
GO:0034704 calcium channel complex 5 4 1.79 23 0.01677 0.01677 0.01677
GO:0016591 RNA polymerase II, holoenzyme 20 14 7.15 9 0.00161 0.01836 0.00161
GO:0005681 spliceosomal complex 12 6 4.29 24 0.01853 0.01853 0.01853
GO:0031225 anchored component of membrane 6 4 2.15 30 0.02104 0.02104 0.02104
kable(
   CC.all[CC.all$Significant < CC.all$Expected,],
   caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CC.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0016021 integral component of membrane The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000000
GO:0016020 membrane A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. 0.0000106
GO:0005887 integral component of plasma membrane The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000405
GO:0016459 myosin complex A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. 0.0000485
GO:0098800 inner mitochondrial membrane protein complex Any protein complex that is part of the inner mitochondrial membrane. 0.0001819
GO:0008076 voltage-gated potassium channel complex A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. 0.0007178
GO:0090575 RNA polymerase II transcription factor complex A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. 0.0010086
GO:0005576 extracellular region The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. 0.0028572
GO:0005886 plasma membrane The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. 0.0041590
GO:0042555 MCM complex A hexameric protein complex required for the initiation and regulation of DNA replication. 0.0089259
showSigOfNodes(GOdata.CC, score(CC.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 76 
## Number of Edges = 133 
## 
## $complete.dag
## [1] "A graph with 76 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)

Using the portion of variance explained by regime

Building the topGO object

I need to generate the topGO objects again, using the alternative gene ordering, based on the proportion of expression-level variance explained by regime. I miss a way to inform the topGOdata object that the score in this case is reversed, with respect to p-values: the higher it is, the more differentially expressed the gene is. To make sure that GO terms are tested in the same way than when using p-values, I will just reverse the proportion of variance explained by regime to its complement. Taking this into account, the subset of interesting genes (selection() function) must be defined as the lowest 10% scores, which are the 10% genes with largest portion of variance explained by regime.

selection <- function(allScores) {return(allScores <= quantile(allScores, probs = 0.10))}
GOdataVar.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdataVar.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdataVar.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

library(knitr)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), numGenes),
   Num_GO_terms = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

BPvar.elim     <- runTest(GOdataVar.BP, algorithm = "elim",     statistic = "ks")
BPvar.weight01 <- runTest(GOdataVar.BP, algorithm = "weight01", statistic = "ks")
BPvar.lea      <- runTest(GOdataVar.BP, algorithm = "lea",      statistic = "ks")
MFvar.elim     <- runTest(GOdataVar.MF, algorithm = "elim",     statistic = "ks")
MFvar.weight01 <- runTest(GOdataVar.MF, algorithm = "weight01", statistic = "ks")
MFvar.lea      <- runTest(GOdataVar.MF, algorithm = "lea",      statistic = "ks")
CCvar.elim     <- runTest(GOdataVar.CC, algorithm = "elim",     statistic = "ks")
CCvar.weight01 <- runTest(GOdataVar.CC, algorithm = "weight01", statistic = "ks")
CCvar.lea      <- runTest(GOdataVar.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) length(score(X))),
   Significant = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 37
BP weight01 1154 29
BP lea 1154 69
MF elim 576 41
MF weight01 576 28
MF lea 576 75
CC elim 291 17
CC weight01 291 13
CC lea 291 35
rm(ResultsSummary)

Results

Biological process

orderedTerms <- names(sort(score(BPvar.weight01)))
significant.weight01 <- score(BPvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BPvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BPvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.variance.sigTerms <- sigTerms

BPvar.all <- GenTable(GOdataVar.BP, elim=BPvar.elim, weight01=BPvar.weight01, lea=BPvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(
   BPvar.all[BPvar.all$Significant > BPvar.all$Expected,],
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0055085 transmembrane transport 500 83 48.38 1 1.8e-12 1.7e-11 1.4e-14
2 GO:0006508 proteolysis 442 78 42.77 2 1.5e-09 9.2e-10 1.6e-12
3 GO:0055114 oxidation-reduction process 376 45 36.38 4 1.4e-05 3.8e-06 1.4e-05
4 GO:0007186 G protein-coupled receptor signaling pat… 206 27 19.93 3 8.1e-06 4.9e-06 8.1e-06
5 GO:0006468 protein phosphorylation 276 38 26.70 6 3.8e-05 6.5e-05 3.8e-05
7 GO:0006355 regulation of transcription, DNA-templat… 355 36 34.35 7 0.00011 0.00024 0.00011
8 GO:0006289 nucleotide-excision repair 13 3 1.26 10 0.00034 0.00034 0.00034
9 GO:0007165 signal transduction 544 57 52.63 201 0.09401 0.00076 0.09401
11 GO:0006470 protein dephosphorylation 58 7 5.61 9 0.00022 0.00095 0.00022
12 GO:0048870 cell motility 14 5 1.35 65 0.01456 0.00192 0.01456
13 GO:0006812 cation transport 272 32 26.32 17 0.00395 0.00193 1.1e-05
14 GO:0003341 cilium movement 10 4 0.97 12 0.00196 0.00196 0.00196
15 GO:0060271 cilium assembly 46 6 4.45 8 0.00018 0.00208 0.00018
17 GO:0005992 trehalose biosynthetic process 6 4 0.58 14 0.00329 0.00329 0.00329
18 GO:0007018 microtubule-based movement 115 13 11.13 11 0.00176 0.00333 6.4e-05
19 GO:0006367 transcription initiation from RNA polyme… 13 5 1.26 16 0.00363 0.00363 0.00363
20 GO:0046039 GTP metabolic process 5 2 0.48 19 0.00431 0.00431 0.00431
21 GO:0046129 purine ribonucleoside biosynthetic proce… 5 1 0.48 20 0.00431 0.00431 0.00431
22 GO:0007160 cell-matrix adhesion 16 6 1.55 24 0.00545 0.00545 0.00545
23 GO:0070836 caveola assembly 7 1 0.68 27 0.00635 0.00635 0.00635
24 GO:0006821 chloride transport 8 1 0.77 28 0.00671 0.00671 0.00671
25 GO:0006979 response to oxidative stress 26 8 2.52 30 0.00720 0.00720 0.00720
26 GO:0043161 proteasome-mediated ubiquitin-dependent … 45 6 4.35 31 0.00757 0.00757 0.00757
27 GO:0051260 protein homooligomerization 25 3 2.42 33 0.00834 0.00834 0.00834
28 GO:0008272 sulfate transport 13 5 1.26 34 0.00873 0.00873 0.00873
29 GO:0042398 cellular modified amino acid biosyntheti… 17 3 1.64 575 0.43428 0.00906 0.43428
30 GO:0006520 cellular amino acid metabolic process 117 13 11.32 831 0.69747 0.01044 0.69747
31 GO:0006744 ubiquinone biosynthetic process 5 1 0.48 42 0.01115 0.01115 0.01115
32 GO:0016236 macroautophagy 6 2 0.58 47 0.01116 0.01116 0.01116
33 GO:0006241 CTP biosynthetic process 5 1 0.48 52 0.01249 0.01249 0.01249
34 GO:0005991 trehalose metabolic process 14 6 1.35 57 0.01256 0.01256 9.1e-05
35 GO:0006165 nucleoside diphosphate phosphorylation 16 3 1.55 272 0.15804 0.01282 0.15804
36 GO:0009206 purine ribonucleoside triphosphate biosy… 27 3 2.61 380 0.25487 0.01283 0.25487
kable(
   BPvar.all[BPvar.all$Significant < BPvar.all$Expected,],
   caption = "Most under-represented terms of the Biological Process ontology.")
Most under-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
6 GO:0006813 potassium ion transport 56 4 5.42 5 1.8e-05 6.8e-05 1.8e-05
10 GO:0034220 ion transmembrane transport 147 13 14.22 13 0.00274 0.00088 0.00274
16 GO:0044271 cellular nitrogen compound biosynthetic … 783 66 75.76 902 0.78861 0.00311 0.84915
37 GO:1901566 organonitrogen compound biosynthetic pro… 432 28 41.80 548 0.41388 0.01340 0.59848
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(BPvar.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0055085 transmembrane transport The process in which a solute is transported across a lipid bilayer, from one side of a membrane to the other. 0.0000000
GO:0006508 proteolysis The hydrolysis of proteins into smaller polypeptides and/or amino acids by cleavage of their peptide bonds. 0.0000000
GO:0055114 oxidation-reduction process A metabolic process that results in the removal or addition of one or more electrons to or from a substance, with or without the concomitant removal or addition of a proton or protons. 0.0000038
GO:0007186 G protein-coupled receptor signaling pathway A series of molecular signals that proceeds with an activated receptor promoting the exchange of GDP for GTP on the alpha-subunit of an associated heterotrimeric G-protein complex. The GTP-bound activated alpha-G-protein then dissociates from the beta- and gamma-subunits to further transmit the signal within the cell. The pathway begins with receptor-ligand interaction, or for basal GPCR signaling the pathway begins with the receptor activating its G protein in the absence of an agonist, and ends with regulation of a downstream cellular process, e.g. transcription. The pathway can start from the plasma membrane, Golgi or nuclear membrane (PMID:24568158 and PMID:16902576). 0.0000049
GO:0006468 protein phosphorylation The process of introducing a phosphate group on to a protein. 0.0000652
GO:0006813 potassium ion transport The directed movement of potassium ions (K+) into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0000680
GO:0006355 regulation of transcription, DNA-templated Any process that modulates the frequency, rate or extent of cellular DNA-templated transcription. 0.0002382
GO:0006289 nucleotide-excision repair A DNA repair process in which a small region of the strand surrounding the damage is removed from the DNA helix as an oligonucleotide. The small gap left in the DNA helix is filled in by the sequential action of DNA polymerase and DNA ligase. Nucleotide excision repair recognizes a wide range of substrates, including damage caused by UV irradiation (pyrimidine dimers and 6-4 photoproducts) and chemicals (intrastrand cross-links and bulky adducts). 0.0003450
GO:0034220 ion transmembrane transport A process in which an ion is transported across a membrane. 0.0008806
GO:0006470 protein dephosphorylation The process of removing one or more phosphoric residues from a protein. 0.0009474
GO:0006812 cation transport The directed movement of cations, atoms or small molecules with a net positive charge, into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0019254
GO:0003341 cilium movement The directed, self-propelled movement of a cilium. 0.0019603
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0020769
GO:0005992 trehalose biosynthetic process The chemical reactions and pathways resulting in the formation of trehalose, a disaccharide isomeric with sucrose and obtained from certain lichens and fungi. 0.0032902
GO:0007018 microtubule-based movement A microtubule-based process that results in the movement of organelles, other microtubules, or other cellular components. Examples include motor-driven movement along microtubules and movement driven by polymerization or depolymerization of microtubules. 0.0033296
GO:0006367 transcription initiation from RNA polymerase II promoter Any process involved in the assembly of the RNA polymerase II preinitiation complex (PIC) at an RNA polymerase II promoter region of a DNA template, resulting in the subsequent synthesis of RNA from that promoter. The initiation phase includes PIC assembly and the formation of the first few bonds in the RNA chain, including abortive initiation, which occurs when the first few nucleotides are repeatedly synthesized and then released. Promoter clearance, or release, is the transition between the initiation and elongation phases of transcription. 0.0036348
GO:0046039 GTP metabolic process The chemical reactions and pathways involving GTP, guanosine triphosphate. 0.0043146
GO:0046129 purine ribonucleoside biosynthetic process The chemical reactions and pathways resulting in the formation of any purine ribonucleoside, a nucleoside in which purine base is linked to a ribose (beta-D-ribofuranose) molecule. 0.0043146
GO:0007160 cell-matrix adhesion The binding of a cell to the extracellular matrix via adhesion molecules. 0.0054525
GO:0070836 caveola assembly The aggregation, arrangement and bonding together of a set of components to form a caveola. A caveola is a plasma membrane raft that forms a small pit, depression, or invagination that communicates with the outside of a cell and extends inward, indenting the cytoplasm and the cell membrane. 0.0063465
GO:0006821 chloride transport The directed movement of chloride into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0067108
GO:0006979 response to oxidative stress Any process that results in a change in state or activity of a cell or an organism (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of oxidative stress, a state often resulting from exposure to high levels of reactive oxygen species, e.g. superoxide anions, hydrogen peroxide (H2O2), and hydroxyl radicals. 0.0072014
GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process The chemical reactions and pathways resulting in the breakdown of a protein or peptide by hydrolysis of its peptide bonds, initiated by the covalent attachment of ubiquitin, and mediated by the proteasome. 0.0075703
GO:0051260 protein homooligomerization The process of creating protein oligomers, compounds composed of a small number, usually between three and ten, of identical component monomers. Oligomers may be formed by the polymerization of a number of monomers or the depolymerization of a large protein polymer. 0.0083350
GO:0008272 sulfate transport The directed movement of sulfate into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. 0.0087330
showSigOfNodes(GOdataVar.BP, score(BPvar.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 233 
## Number of Edges = 447 
## 
## $complete.dag
## [1] "A graph with 233 nodes."

Below I plot variance portion, but for the termo found most significant when using p-values, for comparison.

showGroupDensity(GOdataVar.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdataVar.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdataVar.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MFvar.weight01)))
significant.weight01 <- score(MFvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MFvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MFvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.variance.sigTerms <- sigTerms

MFvar.all <- GenTable(GOdataVar.MF, elim=MFvar.elim, weight01=MFvar.weight01, lea=MFvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(
   MFvar.all[MFvar.all$Significant > MFvar.all$Expected,],
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0005515 protein binding 2101 233 209.27 1 5.3e-14 4.7e-16 1.1e-14
2 GO:0005509 calcium ion binding 361 67 35.96 2 9.2e-13 9.2e-13 9.2e-13
3 GO:0003774 motor activity 131 23 13.05 4 1.2e-05 1.2e-05 6.3e-06
4 GO:0004930 G protein-coupled receptor activity 180 22 17.93 3 4.5e-06 6.3e-05 4.5e-06
5 GO:0004252 serine-type endopeptidase activity 88 21 8.77 5 6.8e-05 6.8e-05 6.8e-05
6 GO:0005524 ATP binding 786 83 78.29 6 0.00020 0.00020 0.00020
7 GO:0004672 protein kinase activity 274 37 27.29 16 0.00210 0.00042 0.00210
8 GO:0022857 transmembrane transporter activity 583 86 58.07 13 0.00108 0.00045 8.4e-11
9 GO:0042626 ATPase activity, coupled to transmembran… 86 12 8.57 11 0.00079 0.00058 0.00079
10 GO:0004198 calcium-dependent cysteine-type endopept… 28 5 2.79 8 0.00059 0.00059 0.00059
11 GO:0005525 GTP binding 198 22 19.72 9 0.00070 0.00070 0.00070
12 GO:0061630 ubiquitin protein ligase activity 27 4 2.69 12 0.00104 0.00104 0.00104
13 GO:0004181 metallocarboxypeptidase activity 19 5 1.89 14 0.00147 0.00147 0.00147
14 GO:0016787 hydrolase activity 1382 218 137.65 269 0.26288 0.00207 5.8e-18
15 GO:0005230 extracellular ligand-gated ion channel a… 54 9 5.38 20 0.00253 0.00231 0.00253
17 GO:0005507 copper ion binding 18 6 1.79 21 0.00259 0.00259 0.00259
18 GO:0051015 actin filament binding 30 9 2.99 27 0.00350 0.00350 0.00350
19 GO:0016840 carbon-nitrogen lyase activity 8 2 0.80 28 0.00381 0.00381 0.00381
20 GO:0004550 nucleoside diphosphate kinase activity 5 1 0.50 31 0.00482 0.00482 0.00482
21 GO:0004601 peroxidase activity 34 8 3.39 10 0.00078 0.00491 0.00078
22 GO:0030246 carbohydrate binding 51 14 5.08 22 0.00259 0.00546 0.00259
23 GO:0005200 structural constituent of cytoskeleton 17 7 1.69 33 0.00599 0.00599 0.00599
24 GO:0016491 oxidoreductase activity 417 49 41.53 26 0.00340 0.00661 0.00340
26 GO:0016772 transferase activity, transferring phosp… 459 54 45.72 315 0.35545 0.00758 0.59000
27 GO:0003777 microtubule motor activity 98 10 9.76 37 0.00833 0.00833 0.00833
28 GO:0047631 ADP-ribose diphosphatase activity 6 2 0.60 38 0.00847 0.00847 0.00847
29 GO:0008271 secondary active sulfate transmembrane t… 13 5 1.29 43 0.01015 0.01015 0.01015
30 GO:0020037 heme binding 95 14 9.46 45 0.01045 0.01045 0.01045
31 GO:0008092 cytoskeletal protein binding 144 21 14.34 85 0.02962 0.01068 0.00102
32 GO:0005506 iron ion binding 72 13 7.17 18 0.00228 0.01292 0.00228
34 GO:0004983 neuropeptide Y receptor activity 9 3 0.90 54 0.01360 0.01360 0.01360
35 GO:0070006 metalloaminopeptidase activity 5 2 0.50 56 0.01367 0.01367 0.01367
36 GO:0008138 protein tyrosine/serine/threonine phosph… 23 4 2.29 57 0.01394 0.01394 0.01394
37 GO:0004866 endopeptidase inhibitor activity 17 3 1.69 25 0.00332 0.01490 0.00332
38 GO:0004555 alpha,alpha-trehalase activity 8 2 0.80 58 0.01496 0.01496 0.01496
39 GO:0008324 cation transmembrane transporter activit… 284 32 28.29 66 0.01772 0.01601 0.00015
40 GO:0005452 inorganic anion exchanger activity 6 2 0.60 65 0.01711 0.01711 0.01711
41 GO:0008137 NADH dehydrogenase (ubiquinone) activity 12 2 1.20 67 0.01904 0.01904 0.01904
kable(
   MFvar.all[MFvar.all$Significant < MFvar.all$Expected,],
   caption = "Most under-represented terms of the Molecular Function ontology.")
Most under-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
16 GO:0005249 voltage-gated potassium channel activity 27 1 2.69 19 0.00246 0.00246 0.00246
25 GO:0043565 sequence-specific DNA binding 146 9 14.54 87 0.03060 0.00745 0.03060
33 GO:0008017 microtubule binding 62 5 6.18 53 0.01355 0.01355 0.01355
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MFvar.weight01)[sigTerms]),
      caption = paste('Molecular functions terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular functions terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000000
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0000000
GO:0003774 motor activity Catalysis of the generation of force resulting either in movement along a microfilament or microtubule, or in torque resulting in membrane scission, coupled to the hydrolysis of a nucleoside triphosphate. 0.0000119
GO:0004930 G protein-coupled receptor activity Combining with an extracellular signal and transmitting the signal across the membrane by activating an associated G-protein; promotes the exchange of GDP for GTP on the alpha subunit of a heterotrimeric G-protein complex. 0.0000627
GO:0004252 serine-type endopeptidase activity Catalysis of the hydrolysis of internal, alpha-peptide bonds in a polypeptide chain by a catalytic mechanism that involves a catalytic triad consisting of a serine nucleophile that is activated by a proton relay involving an acidic residue (e.g. aspartate or glutamate) and a basic residue (usually histidine). 0.0000681
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0001974
GO:0004672 protein kinase activity Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. 0.0004160
GO:0022857 transmembrane transporter activity Enables the transfer of a substance, usually a specific substance or a group of related substances, from one side of a membrane to the other. 0.0004494
GO:0042626 ATPase activity, coupled to transmembrane movement of substances Catalysis of the reaction: ATP + H2O = ADP + phosphate, to directly drive the active transport of a substance across a membrane. 0.0005768
GO:0004198 calcium-dependent cysteine-type endopeptidase activity Catalysis of the hydrolysis of nonterminal peptide bonds in a polypeptide chain by a mechanism using a cysteine residue at the enzyme active center, and requiring the presence of calcium. 0.0005902
GO:0005525 GTP binding Interacting selectively and non-covalently with GTP, guanosine triphosphate. 0.0006978
GO:0061630 ubiquitin protein ligase activity Catalysis of the transfer of ubiquitin to a substrate protein via the reaction X-ubiquitin + S -> X + S-ubiquitin, where X is either an E2 or E3 enzyme, the X-ubiquitin linkage is a thioester bond, and the S-ubiquitin linkage is an amide bond: an isopeptide bond between the C-terminal glycine of ubiquitin and the epsilon-amino group of lysine residues in the substrate or, in the linear extension of ubiquitin chains, a peptide bond the between the C-terminal glycine and N-terminal methionine of ubiquitin residues. 0.0010360
GO:0004181 metallocarboxypeptidase activity Catalysis of the hydrolysis of C-terminal amino acid residues from a polypeptide chain by a mechanism in which water acts as a nucleophile, one or two metal ions hold the water molecule in place, and charged amino acid side chains are ligands for the metal ions. 0.0014696
GO:0005230 extracellular ligand-gated ion channel activity Enables the transmembrane transfer of an ion by a channel that opens when a specific extracellular ligand has been bound by the channel complex or one of its constituent parts. 0.0023052
GO:0005249 voltage-gated potassium channel activity Enables the transmembrane transfer of a potassium ion by a voltage-gated channel. A voltage-gated channel is a channel whose open state is dependent on the voltage across the membrane in which it is embedded. 0.0024622
GO:0005507 copper ion binding Interacting selectively and non-covalently with copper (Cu) ions. 0.0025881
GO:0051015 actin filament binding Interacting selectively and non-covalently with an actin filament, also known as F-actin, a helical filamentous polymer of globular G-actin subunits. 0.0034984
GO:0016840 carbon-nitrogen lyase activity Catalysis of the release of ammonia or one of its derivatives, with the formation of a double bond or ring. Enzymes with this activity may catalyze the actual elimination of the ammonia, amine or amide, e.g. CH-CH(-NH-R) = C=CH- + NH2-R. Others, however, catalyze elimination of another component, e.g. water, which is followed by spontaneous reactions that lead to breakage of the C-N bond, e.g. L-serine ammonia-lyase (EC:4.3.1.17), so that the overall reaction is C(-OH)-CH(-NH2) = CH2-CO- + NH3, i.e. an elimination with rearrangement. The sub-subclasses of EC:4.3 are the ammonia-lyases (EC:4.3.1), lyases acting on amides, amidines, etc. (EC:4.3.2), the amine-lyases (EC:4.3.3), and other carbon-nitrogen lyases (EC:4.3.99). 0.0038068
GO:0004550 nucleoside diphosphate kinase activity Catalysis of the reaction: ATP + nucleoside diphosphate = ADP + nucleoside triphosphate. 0.0048219
GO:0004601 peroxidase activity Catalysis of the reaction: donor + hydrogen peroxide = oxidized donor + 2 H2O. 0.0049118
GO:0030246 carbohydrate binding Interacting selectively and non-covalently with any carbohydrate, which includes monosaccharides, oligosaccharides and polysaccharides as well as substances derived from monosaccharides by reduction of the carbonyl group (alditols), by oxidation of one or more hydroxy groups to afford the corresponding aldehydes, ketones, or carboxylic acids, or by replacement of one or more hydroxy group(s) by a hydrogen atom. Cyclitols are generally not regarded as carbohydrates. 0.0054564
GO:0005200 structural constituent of cytoskeleton The action of a molecule that contributes to the structural integrity of a cytoskeletal structure. 0.0059853
GO:0016491 oxidoreductase activity Catalysis of an oxidation-reduction (redox) reaction, a reversible chemical reaction in which the oxidation state of an atom or atoms within a molecule is altered. One substrate acts as a hydrogen or electron donor and becomes oxidized, while the other acts as hydrogen or electron acceptor and becomes reduced. 0.0066083
GO:0003777 microtubule motor activity Catalysis of movement along a microtubule, coupled to the hydrolysis of a nucleoside triphosphate (usually ATP). 0.0083292
GO:0047631 ADP-ribose diphosphatase activity Catalysis of the reaction: ADP-ribose + H2O = AMP + D-ribose 5-phosphate. 0.0084694
showSigOfNodes(GOdataVar.MF, score(MFvar.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 158 
## Number of Edges = 211 
## 
## $complete.dag
## [1] "A graph with 158 nodes."

I plot variance portion, but for the term found most significant when using p-values, for comparison.

showGroupDensity(GOdataVar.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdataVar.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdataVar.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CCvar.weight01)))
significant.weight01 <- score(CCvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CCvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CCvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
if (length(sigTerms) == 0) {
   sigTerms <- orderedTerms[significant.lea & significant.elim]
}
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.variance.sigTerms <- sigTerms

CCvar.all <- GenTable(GOdataVar.CC, elim=CCvar.elim, weight01=CCvar.weight01, lea=CCvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=max(sum(significant.elim), 10))

kable(
   CCvar.all[CCvar.all$Significant > CCvar.all$Expected,],
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
1 GO:0016021 integral component of membrane 885 126 80.91 1 9.2e-20 1.6e-18 1.3e-24
2 GO:0016020 membrane 1565 199 143.08 2 3.0e-08 1.1e-07 < 1e-30
3 GO:0016459 myosin complex 33 13 3.02 3 6.0e-06 6.0e-06 6.0e-06
5 GO:0005887 integral component of plasma membrane 118 16 10.79 5 0.00030 0.00096 2.9e-06
6 GO:0005576 extracellular region 140 23 12.80 7 0.00092 0.00111 0.00092
7 GO:0090575 RNA polymerase II transcription factor c… 23 5 2.10 11 0.00232 0.00148 0.00232
9 GO:0005886 plasma membrane 170 20 15.54 12 0.00241 0.00259 2.9e-08
10 GO:0005743 mitochondrial inner membrane 33 6 3.02 9 0.00111 0.00355 0.00111
11 GO:0005929 cilium 27 4 2.47 6 0.00091 0.00366 0.00014
13 GO:0098800 inner mitochondrial membrane protein com… 21 2 1.92 23 0.01870 0.00960 0.01870
14 GO:0016591 RNA polymerase II, holoenzyme 20 3 1.83 13 0.00262 0.01302 0.00262
16 GO:0030176 integral component of endoplasmic reticu… 15 2 1.37 42 0.04594 0.01619 0.04594
17 GO:0005764 lysosome 8 2 0.73 19 0.01668 0.01668 0.01668
kable(
   CCvar.all[CCvar.all$Significant < CCvar.all$Expected,],
   caption = "Most under-represented terms of the Cellular Component ontology.")
Most under-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
4 GO:0008076 voltage-gated potassium channel complex 13 1 1.19 4 7.1e-05 7.1e-05 7.1e-05
8 GO:0005634 nucleus 531 43 48.55 10 0.00137 0.00255 0.00011
12 GO:0005741 mitochondrial outer membrane 16 0 1.46 17 0.00922 0.00922 0.00922
15 GO:0098797 plasma membrane protein complex 34 2 3.11 30 0.03400 0.01547 5.0e-05
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CCvar.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with regime according to all 3 algorithms
Term Definition PValue
GO:0016021 integral component of membrane The component of a membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0000000
GO:0016020 membrane A lipid bilayer along with all the proteins and protein complexes embedded in it an attached to it. 0.0000001
GO:0016459 myosin complex A protein complex, formed of one or more myosin heavy chains plus associated light chains and other proteins, that functions as a molecular motor; uses the energy of ATP hydrolysis to move actin filaments or to move vesicles or other cargo on fixed actin filaments; has magnesium-ATPase activity and binds actin. Myosin classes are distinguished based on sequence features of the motor, or head, domain, but also have distinct tail regions that are believed to bind specific cargoes. 0.0000060
GO:0008076 voltage-gated potassium channel complex A protein complex that forms a transmembrane channel through which potassium ions may cross a cell membrane in response to changes in membrane potential. 0.0000713
GO:0005887 integral component of plasma membrane The component of the plasma membrane consisting of the gene products and protein complexes having at least some part of their peptide sequence embedded in the hydrophobic region of the membrane. 0.0009560
GO:0005576 extracellular region The space external to the outermost structure of a cell. For cells without external protective or external encapsulating structures this refers to space outside of the plasma membrane. This term covers the host cell environment outside an intracellular parasite. 0.0011084
GO:0090575 RNA polymerase II transcription factor complex A transcription factor complex that acts at a regulatory region of a gene transcribed by RNA polymerase II. 0.0014827
GO:0005634 nucleus A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. 0.0025467
GO:0005886 plasma membrane The membrane surrounding a cell that separates the cell from its external environment. It consists of a phospholipid bilayer and associated proteins. 0.0025944
GO:0005743 mitochondrial inner membrane The inner, i.e. lumen-facing, lipid bilayer of the mitochondrial envelope. It is highly folded to form cristae. 0.0035516
GO:0005929 cilium A specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface and of some cytoplasmic parts. Each cilium is largely bounded by an extrusion of the cytoplasmic (plasma) membrane, and contains a regular longitudinal array of microtubules, anchored to a basal body. 0.0036627
GO:0005741 mitochondrial outer membrane The outer, i.e. cytoplasm-facing, lipid bilayer of the mitochondrial envelope. 0.0092228
showSigOfNodes(GOdataVar.CC, score(CCvar.weight01),
               firstSigNodes = sum(significant.weight01),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 63 
## Number of Edges = 115 
## 
## $complete.dag
## [1] "A graph with 63 nodes."

For comparison, I plot the distribution of variance portion for the CC term found most significant when using p-values.

showGroupDensity(GOdataVar.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdataVar.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdataVar.CC, orderedTerms[3], rm.one=FALSE)

Comparison between the two ordering of genes.

Biological process

allTerms <- usedGO(GOdata.BP)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% BP.pvalue.sigTerms,
                             Variance = allTerms %in% BP.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(BP.weight01))[allTerms],
                         Variance = rank(score(BPvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of BP terms by significance')

Molecular function

allTerms <- usedGO(GOdata.MF)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% MF.pvalue.sigTerms,
                             Variance = allTerms %in% MF.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(MF.weight01))[allTerms],
                         Variance = rank(score(MFvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth(methdo='lm') + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of MF terms by significance')
## Warning: Ignoring unknown parameters: methdo

Cellular component

allTerms <- usedGO(GOdata.CC)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% CC.pvalue.sigTerms,
                             Variance = allTerms %in% CC.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(CC.weight01))[allTerms],
                         Variance = rank(score(CCvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of CC terms by significance')

Session info

I save the main variables to be able to load them in a new R session later.

save(allgenes2GO,
     GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
     GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
     GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
     GOdataVar.BP, BPvar.elim, BPvar.weight01, BPvar.lea, BP.variance.sigTerms,
     GOdataVar.MF, MFvar.elim, MFvar.weight01, MFvar.lea, MF.variance.sigTerms,
     GOdataVar.CC, CCvar.elim, CCvar.weight01, CCvar.lea, CC.variance.sigTerms,
     file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] Rgraphviz_2.30.0     ggplot2_3.3.0        limma_3.42.2        
##  [4] knitr_1.28           topGO_2.38.1         SparseM_1.78        
##  [7] GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2      
## [10] S4Vectors_0.24.4     Biobase_2.46.0       graph_1.64.0        
## [13] BiocGenerics_0.32.0 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.0.0   xfun_0.13          purrr_0.3.4        splines_3.6.3     
##  [5] lattice_0.20-41    colorspace_1.4-1   vctrs_0.2.4        htmltools_0.4.0   
##  [9] yaml_2.2.1         mgcv_1.8-31        blob_1.2.1         rlang_0.4.6       
## [13] pillar_1.4.3       glue_1.4.0         withr_2.2.0        DBI_1.1.0         
## [17] bit64_0.9-7        matrixStats_0.56.0 lifecycle_0.2.0    stringr_1.4.0     
## [21] munsell_0.5.0      gtable_0.3.0       memoise_1.1.0      evaluate_0.14     
## [25] labeling_0.3       highr_0.8          Rcpp_1.0.4.6       scales_1.1.0      
## [29] farver_2.0.3       bit_1.1-15.2       digest_0.6.25      stringi_1.4.6     
## [33] dplyr_0.8.5        tools_3.6.3        magrittr_1.5       RSQLite_2.2.0     
## [37] tibble_3.0.1       crayon_1.3.4       pkgconfig_2.0.3    ellipsis_0.3.0    
## [41] Matrix_1.2-18      assertthat_0.2.1   rmarkdown_2.1      R6_2.4.1          
## [45] nlme_3.1-147       compiler_3.6.3